Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words

홈 > 연구문헌 > 영문 논문지 > JIPS (한국정보처리학회)

한글제목(Korean Title)	Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words
영문제목(English Title)	Improving Abstractive Summarization by Training Masked Out-of-Vocabulary Words
저자(Author)	Tae-Seok Lee Hyun-Young Lee Seung-Shik Kang
원문수록처(Citation)	VOL 18 NO. 03 PP. 0344 ~ 0358 (2022. 06)
한글내용 (Korean Abstract)
영문내용 (English Abstract)	Text summarization is the task of producing a shorter version of a long document while accurately preserving the main contents of the original text. Abstractive summarization generates novel words and phrases using a language generation method through text transformation and prior-embedded word information. However, newly coined words or out-of-vocabulary words decrease the performance of automatic summarization because they are not pre-trained in the machine learning process. In this study, we demonstrated an improvement in summarization quality through the contextualized embedding of BERT with out-of-vocabulary masking. In addition, explicitly providing precise pointing and an optional copy instruction along with BERT embedding, we achieved an increased accuracy than the baseline model. The recall-based word-generation metric ROUGE-1 score was 55.11 and the word-order-based ROUGE-L score was 39.65.
키워드(Keyword)	BERT Deep Learning Generative Summarization Selective OOV Copy Model Unknown Words
파일첨부	PDF 다운로드